Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Added the ability to filter the corresponding files based on the access_time, modify time, and change time of the file for statistics. #399

Merged
merged 1 commit into from
Jun 24, 2024

Conversation

wugeer
Copy link
Contributor

@wugeer wugeer commented May 17, 2024

Background

Filter by file age #340
Add a new feature to dust to filter files based on their last update time, last access time, and last modification time.

In order to achieve this requirement, I referred to the input of the linux find command, and ultimately planned to implement the dust filter by file age function based on the input style of find.

image

I have a little question here, why does dust output files or directories with a file size of 0 by default? The dust tool is used to quickly find those files and directories in the specified directory that take up too much space, and those empty files or directories will only affect the look and feel. Originally, I have implemented filtering out those files or directories with a size of 0, but when I run cargo test When running the test, I found that some samples reported errors, so I canceled this part of the code, but I still wanted to retain the function of filtering out those files or directories with a size of 0. I hope this function can be further explored. thanks

Full Changelogs

  • feat: Added the ability to filter the corresponding files based on the access_time, modify time, and change time of the file for statistics

Issue Reference

fix: Filter by file age #340

Test Result

dust help command output
image

Test Data

image

cargo test run test results

image

Adding --mtime or --atime or --ctime results in actual execution of dust and linux find with similar parameters.

image

image

image

image

image

@bootandy
Copy link
Owner

Apologies I've had a laptop failure. Don't want to risk looking at this on my work laptop for legal reasons. Will review this when I get a new machine.

@@ -37,6 +37,12 @@ _dust() {
'--output-format=[Changes output display size. si will print sizes in powers of 1000. b k m g t kb mb gb tb will print the whole tree in that size.]:FORMAT:(si b k m g t kb mb gb tb)' \
'-S+[Specify memory to use as stack size - use if you see\: '\''fatal runtime error\: stack overflow'\'' (default low memory=1048576, high memory=1073741824)]:STACK_SIZE: ' \
'--stack-size=[Specify memory to use as stack size - use if you see\: '\''fatal runtime error\: stack overflow'\'' (default low memory=1048576, high memory=1073741824)]:STACK_SIZE: ' \
'-M+[File'\''s data was last modified less than, more than or exactly <mtime>*24 hours ago. When dust figures out how many 24-hour periods ago the file was last modified, any fractional part is ignored, so to match -mtime +1, a file has to have been modified at least two days ago.]: : ' \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making the description more easily readable

Suggested change
'-M+[File'\''s data was last modified less than, more than or exactly <mtime>*24 hours ago. When dust figures out how many 24-hour periods ago the file was last modified, any fractional part is ignored, so to match -mtime +1, a file has to have been modified at least two days ago.]: : ' \
'-M+[Sort by time modifed, include only files modified +/-<mtime> days before/after, days are rounded down, prefix \'\-\' for newer, \'\+\' for older. : : ' \

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, Filter by time mofied is better than Sort by time modifed because this parameter is used for filtering, not sorting. right? :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, you can rephrase it as you wish, just pointed that its a bit too long :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, that description is a bit too long. I rephrased it . I will be very happy if you can help me review the code again. :)

@@ -167,6 +191,9 @@ fn walk(dir: PathBuf, walk_data: &WalkData, depth: usize) -> Option<Node> {
data.is_file(),
walk_data.by_filecount,
depth,
filter_modified_time,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider changing build_node to accept a walk_data argument instead of individually 7 of walk_data's fields

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, you are right, I reorganized the code according to your ideas. I will be very happy if you can help me review the code again. :)

@rindeal
Copy link
Contributor

rindeal commented May 29, 2024

@bootandy: Apologies I've had a laptop failure. Don't want to risk looking at this on my work laptop for legal reasons. Will review this when I get a new machine.

You are always one click away from a GitHub Codespace

@wugeer wugeer requested a review from codeswhite May 30, 2024 14:31
Copy link

@codeswhite codeswhite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!
Should wait for @bootandy to review as well

I still suggest that after merging this we continue to work on a more robust implementation that will support all timeframes (like i explained in #340 )

@wugeer
Copy link
Contributor Author

wugeer commented May 31, 2024

Looks good! Should wait for @bootandy to review as well

I still suggest that after merging this we continue to work on a more robust implementation that will support all timeframes (like i explained in #340 ) All timeframes (like I explained in #340).

That sounds like a good idea. I like it! I'll try it. :)

@wugeer wugeer closed this Jun 11, 2024
@wugeer wugeer reopened this Jun 11, 2024
@wugeer
Copy link
Contributor Author

wugeer commented Jun 11, 2024

Looks good! Should wait for @bootandy to review as well

I still suggest that after merging this we continue to work on a more robust implementation that will support all timeframes (like i explained in #340 )

I've been thinking seriously about this over the past few days. If we modify the tool according to your approach, we'll need to add a parameter to set the upper and lower limits of the filter time. Otherwise, the scenario for equal values would no longer exist. While this method provides great flexibility, I'm concerned about whether such a high level of flexibility is necessary for this tool. Adding this feature might place a significant cognitive burden on the users. After all, people are probably accustomed to using the find command with the -mtime option, which filters by days, or the -mmin option, which filters by minutes. What are your thoughts on this? @bootandy

@bootandy
Copy link
Owner

I have a little question here, why does dust output files or directories with a file size of 0 by default?

Just incase you are running on a very small folder. Some people seem to use dust as a sort of file-finder so I want to include them.

[CompletionResult]::new('-A', 'A ', [CompletionResultType]::ParameterName, 'just like -mtime, but based on file access time')
[CompletionResult]::new('--atime', 'atime', [CompletionResultType]::ParameterName, 'just like -mtime, but based on file access time')
[CompletionResult]::new('-y', 'y', [CompletionResultType]::ParameterName, 'just like -mtime, but based on file change time')
[CompletionResult]::new('--ctime', 'ctime', [CompletionResultType]::ParameterName, 'just like -mtime, but based on file change time')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we need all these - I like the idea of a including a filter that works with modified time. But 'access time' - would someone actually use that? I have to launch 'stat' to even see it.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah we can leave it in if you'd like.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, access time is an effective filtering tool when users want to determine whether a file has been accessed recently based on the access time to determine whether other people are using the file.

Copy link
Owner

@bootandy bootandy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that you based the parameters for this work from find.

We seem to have a few cargo style problems. But overall I think this approach is correct.

depth,
walk_data,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sensible change, there were too many params.

@wugeer
Copy link
Contributor Author

wugeer commented Jun 18, 2024

I have a little question here, why does dust output files or directories with a file size of 0 by default? I have a little question, why does dust output files or directories with a file size of 0 by default?

Just incase you are running on a very small folder. Some people seem to use dust as a sort of file-finder so I want to include them. Some people seem to use dust as a file finding tool, so I wanted to include them.

Yes, you are right! :)

@wugeer
Copy link
Contributor Author

wugeer commented Jun 18, 2024

I like that you based the parameters for this work from find. I like that you based the parameters for this work from find.

We seem to have a few cargo style problems. But overall I think this approach is correct. We seem to have a few cargo style problems. But overall, I think this approach is the right one.

I'm fairly new to Rust , so my coding style may not be very good. If there is anything that needs to be modified, I hope you will give me some advice. thx!

@bootandy
Copy link
Owner

I'm fairly new to Rust , so my coding style may not be very good. If there is anything that needs to be modified, I hope you will give me some advice. thx!

Your coding style looks fine to me. The build however gave style errors ^ see the red 'X's in the build.

On your cmd line:

Run 'cargo fmt' - this neatens things automatically

Run 'cargo clippy' - this is a linter which will suggest fixes and warns of likely bugs, so you'll have to review each one.

you can see the list of errors from the online build here: https://github.com/bootandy/dust/actions/runs/9461768640/job/26334984807?pr=399

@wugeer wugeer force-pushed the feature_xhw branch 2 times, most recently from 0f6e035 to 98e26ba Compare June 19, 2024 08:53
@wugeer
Copy link
Contributor Author

wugeer commented Jun 19, 2024

I'm fairly new to Rust , so my coding style may not be very good. If there is anything that needs to be modified, I hope you will give me some advice. thx!

Your coding style looks fine to me. The build however gave style errors ^ see the red 'X's in the build.

On your cmd line:

Run 'cargo fmt' - this neatens things automatically

Run 'cargo clippy' - this is a linter which will suggest fixes and warns of likely bugs, so you'll have to review each one.

you can see the list of errors from the online build here: https://github.com/bootandy/dust/actions/runs/9461768640/job/26334984807?pr=399

Thank you for your guidance in helping me improve my code to meet the standards. I have resubmitted it, and all checks have passed. :)

@bootandy
Copy link
Owner

this method provides great flexibility, I'm concerned about whether such a high level of flexibility is necessary for this tool. Adding this feature might place a significant cognitive burden on the users. After all, people are probably accustomed to using the find command with the -mtime option, which filters by days, or the -mmin option, which filters by minutes. What are your thoughts on this? @bootandy

I don't have strong opinions on this. But lets stabilise -mtime an -mmin for now. If we want to do the more flexible one we can do that in a followup. It definitely shouldn't go in this PR.

@bootandy
Copy link
Owner

Would you mind doing a rebase or merge - I've accepted a different PR and there are some conflicts ^ - thanks .

@wugeer
Copy link
Contributor Author

wugeer commented Jun 20, 2024

this method provides great flexibility, I'm concerned about whether such a high level of flexibility is necessary for this tool. Adding this feature might place a significant cognitive burden on the users. After all, people are probably accustomed to using the find command with the -mtime option, which filters by days, or the -mmin option, which filters by minutes. What are your thoughts on this? @bootandy

I don't have strong opinions on this. But lets stabilise -mtime an -mmin for now. If we want to do the more flexible one we can do that in a followup. It definitely shouldn't go in this PR.

Indeed, I agree with you. :)

@wugeer
Copy link
Contributor Author

wugeer commented Jun 20, 2024

Would you mind doing a rebase or merge - I've accepted a different PR and there are some conflicts ^ - thanks .

Ok. I'm happy to do it! :)

…e access time, modify time, and change time of the file for statistics
use crate::display::get_number_format;

pub static DAY_SECEONDS: i64 = 24 * 60 * 60;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, due to my carelessness, I did not correct this spelling error in time. I will resubmit a PR for processing.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a problem! we all do it.

@bootandy bootandy merged commit d65f410 into bootandy:master Jun 24, 2024
17 checks passed
@rindeal rindeal mentioned this pull request Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants